Goto

Collaborating Authors

 negative item



A Related Work

Neural Information Processing Systems

The latest CL-based CF methods can roughly fall into two research lines. The second category, referred to as "loss-based" approaches, The prevailing augmentation-based paradigm in CL-based CF methods is to employ user-item bipartite graph augmentations to generate contrasting views. Despite the remarkable success of CL-based CF methods, there remains a lack of theoretical understanding, particularly regarding the superior generalization ability of contrastive loss. B.4 Align T op-K evaluation metric Discounted Cumulative Gain (DCG) is a commonly used ranking metric in top-K recommendation In DCG, the relevance of an item's contribution to the utility decreases logarithmically in relation to its position in the ranked list. The training set is comprised of 311,704 user-selected ratings ranging from 1 to 5. The test set includes ratings for ten songs randomly exposed to each user.



A Related Work

Neural Information Processing Systems

The latest CL-based CF methods can roughly fall into two research lines. The second category, referred to as "loss-based" approaches, The prevailing augmentation-based paradigm in CL-based CF methods is to employ user-item bipartite graph augmentations to generate contrasting views. Despite the remarkable success of CL-based CF methods, there remains a lack of theoretical understanding, particularly regarding the superior generalization ability of contrastive loss. B.4 Align T op-K evaluation metric Discounted Cumulative Gain (DCG) is a commonly used ranking metric in top-K recommendation In DCG, the relevance of an item's contribution to the utility decreases logarithmically in relation to its position in the ranked list. The training set is comprised of 311,704 user-selected ratings ranging from 1 to 5. The test set includes ratings for ten songs randomly exposed to each user.




On Negative-aware Preference Optimization for Recommendation

Ding, Chenlu, Liu, Daoxuan, Wu, Jiancan, Hu, Xingyu, Wu, Junkang, Wang, Haitao, Wang, Yongkang, Wang, Xingxing, Wang, Xiang

arXiv.org Artificial Intelligence

Recommendation systems leverage user interaction data to suggest relevant items while filtering out irrelevant (negative) ones. The rise of large language models (LLMs) has garnered increasing attention for their potential in recommendation tasks. However, existing methods for optimizing LLM-based recommenders face challenges in effectively utilizing negative samples. Simply integrating large numbers of negative samples can improve ranking accuracy and mitigate popularity bias but often leads to increased computational overhead and memory costs. Additionally, current approaches fail to account for the varying informativeness of negative samples, leading to suboptimal optimization performance. To address these issues, we propose NAPO (\textbf{N}egative-\textbf{A}ware \textbf{P}reference \textbf{O}ptimization), an enhanced framework for preference optimization in LLM-based recommendation. NAPO introduces two key innovations: (1) in-batch negative sharing, which expands the pool of negative samples without additional memory overhead, and (2) dynamic reward margin adjustment, which adapts model updates based on the confidence of negative samples. Extensive experiments on three public datasets demonstrate that NAPO outperforms existing methods in both recommendation accuracy and popularity bias reduction.


Evaluating Performance and Bias of Negative Sampling in Large-Scale Sequential Recommendation Models

Prakash, Arushi, Bermperidis, Dimitrios, Chennu, Srivas

arXiv.org Artificial Intelligence

Large-scale industrial recommendation models predict the most relevant items from catalogs containing millions or billions of options. To train these models efficiently, a small set of irrelevant items (negative samples) is selected from the vast catalog for each relevant item (positive example), helping the model distinguish between relevant and irrelevant items. Choosing the right negative sampling method is a common challenge. We address this by implementing and comparing various negative sampling methods - random, popularity-based, in-batch, mixed, adaptive, and adaptive with mixed variants - on modern sequential recommendation models. Our experiments, including hyperparameter optimization and 20x repeats on three benchmark datasets with varying popularity biases, show how the choice of method and dataset characteristics impact key model performance metrics. We also reveal that average performance metrics often hide imbalances across popularity bands (head, mid, tail). We find that commonly used random negative sampling reinforces popularity bias and performs best for head items. Popularity-based methods (in-batch and global popularity negative sampling) can offer balanced performance at the cost of lower overall model performance results. Our study serves as a practical guide to the trade-offs in selecting a negative sampling method for large-scale sequential recommendation models. Code, datasets, experimental results and hyperparameters are available at: https://github.com/apple/ml-negative-sampling.


Preference Diffusion for Recommendation

Liu, Shuo, Zhang, An, Hu, Guoqing, Qian, Hong, Chua, Tat-seng

arXiv.org Artificial Intelligence

Recommender systems predict personalized item rankings based on user preference distributions derived from historical behavior data. Recently, diffusion models (DMs) have gained attention in recommendation for their ability to model complex distributions, yet current DM-based recommenders often rely on traditional objectives like mean squared error (MSE) or recommendation objectives, which are not optimized for personalized ranking tasks or fail to fully leverage DM's generative potential. To address this, we propose PreferDiff, a tailored optimization objective for DM-based recommenders. PreferDiff transforms BPR into a log-likelihood ranking objective and integrates multiple negative samples to better capture user preferences. Specifically, we employ variational inference to handle the intractability through minimizing the variational upper bound and replaces MSE with cosine error to improve alignment with recommendation tasks. Finally, we balance learning generation and preference to enhance the training stability of DMs. PreferDiff offers three key benefits: it is the first personalized ranking loss designed specifically for DM-based recommenders and it improves ranking and faster convergence by addressing hard negatives. We also prove that it is theoretically connected to Direct Preference Optimization which indicates that it has the potential to align user preferences in DM-based recommenders via generative modeling. Extensive experiments across three benchmarks validate its superior recommendation performance and commendable general sequential recommendation capabilities. Our codes are available at \url{https://github.com/lswhim/PreferDiff}.


Interpretable Triplet Importance for Personalized Ranking

He, Bowei, Ma, Chen

arXiv.org Artificial Intelligence

Personalized item ranking has been a crucial component contributing to the performance of recommender systems. As a representative approach, pairwise ranking directly optimizes the ranking with user implicit feedback by constructing (\textit{user}, \textit{positive item}, \textit{negative item}) triplets. Several recent works have noticed that treating all triplets equally may hardly achieve the best effects. They assign different importance scores to negative items, user-item pairs, or triplets, respectively. However, almost all the generated importance scores are groundless and hard to interpret, thus far from trustworthy and transparent. To tackle these, we propose the \textit{Triplet Shapley} -- a Shapely value-based method to measure the triplet importance in an interpretable manner. Due to the huge number of triplets, we transform the original Shapley value calculation to the Monte Carlo (MC) approximation, where the guarantee for the approximation unbiasedness is also provided. To stabilize the MC approximation, we adopt a control covariates-based method. Finally, we utilize the triplet Shapley value to guide the resampling of important triplets for benefiting the model learning. Extensive experiments are conducted on six public datasets involving classical matrix factorization- and graph neural network-based recommendation models. Empirical results and subsequent analysis show that our model consistently outperforms the state-of-the-art methods.